27 research outputs found
A deep learning pipeline for product recognition on store shelves
Recognition of grocery products in store shelves poses peculiar challenges.
Firstly, the task mandates the recognition of an extremely high number of
different items, in the order of several thousands for medium-small shops, with
many of them featuring small inter and intra class variability. Then, available
product databases usually include just one or a few studio-quality images per
product (referred to herein as reference images), whilst at test time
recognition is performed on pictures displaying a portion of a shelf containing
several products and taken in the store by cheap cameras (referred to as query
images). Moreover, as the items on sale in a store as well as their appearance
change frequently over time, a practical recognition system should handle
seamlessly new products/packages. Inspired by recent advances in object
detection and image retrieval, we propose to leverage on state of the art
object detectors based on deep learning to obtain an initial productagnostic
item detection. Then, we pursue product recognition through a similarity search
between global descriptors computed on reference and cropped query images. To
maximize performance, we learn an ad-hoc global descriptor by a CNN trained on
reference images based on an image embedding loss. Our system is
computationally expensive at training time but can perform recognition rapidly
and accurately at test time
Real-time self-adaptive deep stereo
Deep convolutional neural networks trained end-to-end are the
state-of-the-art methods to regress dense disparity maps from stereo pairs.
These models, however, suffer from a notable decrease in accuracy when exposed
to scenarios significantly different from the training set, e.g., real vs
synthetic images, etc.). We argue that it is extremely unlikely to gather
enough samples to achieve effective training/tuning in any target domain, thus
making this setup impractical for many applications. Instead, we propose to
perform unsupervised and continuous online adaptation of a deep stereo network,
which allows for preserving its accuracy in any environment. However, this
strategy is extremely computationally demanding and thus prevents real-time
inference. We address this issue introducing a new lightweight, yet effective,
deep stereo architecture, Modularly ADaptive Network (MADNet) and developing a
Modular ADaptation (MAD) algorithm, which independently trains sub-portions of
the network. By deploying MADNet together with MAD we introduce the first
real-time self-adaptive deep stereo system enabling competitive performance on
heterogeneous datasets.Comment: Accepted at CVPR2019 as oral presentation. Code Available
https://github.com/CVLAB-Unibo/Real-time-self-adaptive-deep-stere
Computer Vision and Deep Learning for retail store management
The management of a supermarket or retail store is a quite complex process that requires the coordinated execution of many different tasks (\eg, shelves management, inventory, surveillance, customer support\dots). Thanks to recent advancements of technology, many of those repetitive tasks can be completely or partially automated. One key technology requirement is the ability to understand a scene based only on information acquired by a camera, for this reason, we will focus on computer vision techniques to solve management problems inside a grocery retail store. We will address two main problems: (a) how to detect and recognize automatically products exposed on store shelves and (b) how to obtain a reliable 3D reconstruction of an environment using only information coming from a camera. We will tackle (a) both in a constrained version where the objective is to verify the compliance of observed items to a planned disposition, as well as an unconstrained one where no assumption on the observed scenes are considered. As for (b), a good solution represents one of the first crucial steps for the development and deployment of low-cost autonomous agents able to safely navigate inside the store either to carry out management jobs or to help customers (\eg, autonomous cart or shopping assistant). We believe that algorithms for depth prediction from stereo or mono camera are good candidates for the solution of this problem. The current state of the art algorithms, however, rely heavily on machine learning and can be hardly applied in the retail environment due to problems arising from the domain shift between data used to train them (usually synthetic images) and the deployment scenario (real indoor images). We will introduce techniques to adapt those algorithms to unseen environments without the need of costly ground truth data and in real time
LatentSwap3D: Semantic Edits on 3D Image GANs
3D GANs have the ability to generate latent codes for entire 3D volumes
rather than only 2D images. These models offer desirable features like
high-quality geometry and multi-view consistency, but, unlike their 2D
counterparts, complex semantic image editing tasks for 3D GANs have only been
partially explored. To address this problem, we propose LatentSwap3D, a
semantic edit approach based on latent space discovery that can be used with
any off-the-shelf 3D or 2D GAN model and on any dataset. LatentSwap3D relies on
identifying the latent code dimensions corresponding to specific attributes by
feature ranking using a random forest classifier. It then performs the edit by
swapping the selected dimensions of the image being edited with the ones from
an automatically selected reference image. Compared to other latent space
control-based edit methods, which were mainly designed for 2D GANs, our method
on 3D GANs provides remarkably consistent semantic edits in a disentangled
manner and outperforms others both qualitatively and quantitatively. We show
results on seven 3D GANs (pi-GAN, GIRAFFE, StyleSDF, MVCGAN, EG3D, StyleNeRF,
and VolumeGAN) and on five datasets (FFHQ, AFHQ, Cats, MetFaces, and CompCars).Comment: The paper has been accepted by ICCV'23 AI3DC